Fast Approximate Motif Statistics
نویسنده
چکیده
We present in this article a fast approximate method for computing the statistics of a number of non-self-overlapping matches of motifs in a random text in the nonuniform Bernoulli model. This method is well suited for protein motifs where the probability of self-overlap of motifs is small. For 96% of the PROSITE motifs, the expectations of occurrences of the motifs in a 7-million-amino-acids random database are computed by the approximate method with less than 1% error when compared with the exact method. Processing of the whole PROSITE takes about 30 seconds with the approximate method. We apply this new method to a comparison of the C. elegans and S. cerevisiae proteomes.
منابع مشابه
Approximate Dynamic Analysis of Structures for Earthquake Loading Using FWT
Approximate dynamic analysis of structures is achieved by fast wavelet transform (FWT). The loads are considered as time history earthquake loads. To reduce the computational work, FWT is used by which the number of points in the earthquake record are reduced. For this purpose, the theory of wavelets together with filter banks are used. The low and high pass filters are used for the decompositi...
متن کاملAnalysis and design of approximate queries over XML documents using statistical techniques
In the last few years several repositories for storing XML documents and languages for querying XML data have been studied and implemented. All the query languages proposed so far allow to obtain exact answers, but when applied to large XML repositories or warehouses, such precise queries may require high response times. To overcome this problem, in traditional relational warehouses fast approx...
متن کاملComputational Methods For Functional Motif Identification and Approximate Dimension Reduction in Genomic Data
Computational Methods For Functional Motif Identification and Approximate Dimension Reduction in Genomic Data by Stoyan Georgiev Department of Computational Biology and Bioinformatics Duke University
متن کاملInferring Higher-Order Structure Statistics of Large Networks From Sampled Edges
Recently exploring higher-order organizational patterns (e.g., locally connected subgraphs, also known as motifs or graphlets) of complex networks such as online social networks and communication networks attracts a lot of attention. Previous work made the strong assumption that the graph topology of interest is known in advance. In practice, sometimes researchers have to deal with the situatio...
متن کاملApproximate Bayesian inference in spatial GLMM with skew normal latent variables
Spatial generalized linear mixed models are common in applied statistics. Most users are satisfied using a Gaussian distribution for the spatial latent variables in this model, but it is unclear whether the Gaussian assumption holds. Wrong Gaussian assumptions cause bias in parameter estimates and affect the accuracy of spatial predictions. Thus, there is a need for more flexible priors for the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 8 3 شماره
صفحات -
تاریخ انتشار 2001